Datenbank-spektrum Schwerpunkt: Mapreduce Programming Model Compilation of Query Languages into Mapreduce Efficient or Hadoop: Why Not Both? Parallel Entity Resolution with Dedoop Inkrementelle Neuberechnungen in Mapreduce Fachbeitrag towards Integrated Data Analytics: Time Series Forecasting in Dbms
نویسنده
چکیده
s publiziert/indexiert in Google Scholar, Academic OneFile, DBLP, io-port.net, OCLC, Summon by Serial Solutions. Hinweise für Autoren für die Zeitschrift Datenbank Spektrum finden Sie auf www.springer.com/13222. Datenbank Spektrum (2013) 13:1–3 DOI 10.1007/s13222-013-0116-z
منابع مشابه
Dedoop: Efficient Deduplication with Hadoop
We demonstrate a powerful and easy-to-use tool called Dedoop (Deduplication with Hadoop) for MapReduce-based entity resolution (ER) of large datasets. Dedoop supports a browser-based specification of complex ER workflows including blocking and matching steps as well as the optional use of machine learning for the automatic generation of match classifiers. Specified workflows are automatically t...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملEfficient Big Data Processing in Hadoop MapReduce
This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming...
متن کاملA SCALLA: A Platform for Scalable One-Pass Analytics using MapReduce
Today’s one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programming model for processing large datasets using a cluster of machines. However, the traditional MapReduce model is not well-suited for one-pass analytics, since it is geared towards batch processing and requires the data se...
متن کامل